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VARIANCE ESTIMATION IN NONPARAMETRIC REGRESSION 
VIA THE DIFFERENCE SEQUENCE METHOD 

By Lawrence D. Brown 1 and M. Levine 1,2 

University of Pennsylvania and Purdue University 

Consider a Gaussian nonparametric regression problem having 
both an unknown mean function and unknown variance function. 
This article presents a class of difference-based kernel estimators for 
the variance function. Optimal convergence rates that are uniform 
over broad functional classes and bandwidths are fully characterized, 
and asymptotic normality is also established. We also show that for 
suitable asymptotic formulations our estimators achieve the minimax 
rate. 

1. Introduction. Let us consider the nonparametric regression problem 



where g{x) is an unknown mean function, the errors are i.i.d. with mean 
zero, variance 1 and the finite fourth moment /i4 < oo while the design is 
fixed. We assume that max{xj + i — Xi] = 0(n~ l ) for \/i = 0, . . . , n. Also, the 
usual convention xo = and x n+ \ = 1 applies. The problem we are inter- 
ested in is estimating the variance V{x) when the mean g(x) is unknown. 
In other words, the mean g{x) plays the role of a nuisance parameter. The 
problem of variance estimation in nonparametric regression was first seri- 
ously considered in the 1980s. The practical importance of this problem has 
been also amply illustrated. It is needed to construct a confidence band for 
any mean function estimate (see, e.g., Hart [24], Chapter 4). It is of interest 
in confidence interval determination for turbulence modeling (Ruppert et al. 
[34]), financial time series (Hardle and Tsybakov [23], Fan and Yao [18]), co- 
variance structure estimation for nonstationary longitudinal data (see, e.g., 
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Diggle and Verbyla [10]), estimating correlation structure of heteroscedastic 
spatial data (Opsomer et al. [31]), nonparametric regression with lognormal 
errors as discussed in Brown et al. [2] and Shen and Brown [36], and many 
other problems. 

In what follows we describe in greater detail the history of a particular 
approach to the problem, von Neumann [40, 41] and then Rice [33] consid- 
ered the special, homoscedastic situation in which V(x) = er 2 in the model 

(1) but <t 2 is unknown. They proposed relatively simple estimators of the 
form 

n— 1 

(2) ^) = ^ri)I>+i-y,) 2 - 



i=l 



The next logical step was made in Gasser, Sroka and Jennen-Steinmetz [19], 
where three neighboring points were used to estimate the variance, 

2 n_2 /l 1 \ 2 

(3) V(x) = ^—^ g - y i+1 + -y l+2 j . 

A further general step was made in Hall, Kay and Titterington [21]. The 
following definition is needed first. 

Definition 1.1. Let us consider a sequence of numbers {g^}[ =0 such 
that 

(4) I> = 

i=0 

while 

(5) Ed? = l- 

Such a sequence is called a difference sequence of order r. 



For example, when r = 1, we have d$ = d\ = — do, which defines the 
first difference Ay = 1 • ^ ne estimator of Hall, Kay and Titterington 
[21] can be defined as 

n—r / r \ 2 

(6) V(x) = (n-rr 1 Y: ' 

j=l \j=0 / 

The conditions (4) and (5) are meant to insure the unbiasedness of the 
estimator (6) when g is constant and also the identifiability of the sequence 

R}. 
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A different direction was taken in Hall and Carroll [20] and Hall and 
Marron [22] where the variance was estimated by an average of squared 
residuals from a fit to g; for other work on constant variance estimation, 
see also Buckley, Eagleson and Silverman [5], Buckley and Eagleson [4] and 
Carter and Eagleson [7]. 

The difference sequence idea introduced by Hall, Kay and Titterington 
[21] can be modified for the case of a nonconstant variance function V(x). As 
a rule, the average of squared differences of observations has to be localized 
in one way or another — for example, by using the nearest neighbor average, a 
spline approach or local polynomial regression. The first to try to generalize 
it in this way were probably Miiller and Stadtmiiller [27]. It was further 
developed in Hall, Kay and Titterington [21], Miiller and Stadtmiiller [28], 
Seifert, Gasser and Wolf [35], Dette, Munk and Wagner [9], and many others. 
An interesting application of this type of a variance function estimator for 
the purpose of testing the functional form of the given regression model is 
given in Dette [8]. 

Another possible route to estimating the variance function V{x) is to use 
the local average of the squared residuals from the estimation of g{x). One of 
the first applications of this principle was in Hall and Carroll [20]. A closely 
related estimator was also considered earlier in Carroll [6] and Matloff, Rose 
and Tai [26]. This approach has also been considered in Fan and Yao [18]. 

Some of the latest work in the area of variance estimation includes at- 
tempts to derive methods that are suitable for the case where X £ lZ d for 
d > 1; see, for example, Spokoiny [38] for generalization of the residual-based 
method and Munk, Bissantz, Wagner and Freitag [29] for generalization of 
the difference-based method. 

The present research describes a class of nonparametric variance esti- 
mators based on difference sequences and local polynomial estimation, and 
investigates their asymptotic behavior. Section 2 introduces the estimator 
class and investigates its asymptotic rates of convergence as well as the 
choice of the optimal bandwidth. Section 3 establishes the asymptotic nor- 
mality of these estimators. Section 4 investigates the question of asymptotic 
minimaxity for our estimator class among all possible variance estimators 
for nonparametric regression. 

2. Variance function estimators. Consider the model (1). We begin with 
the following formal definition. 

Definition 2.1. A pseudoresidual of order r is 

r 

(7) Aj = A r>i = ^2djy j+i _i r /2i, 
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where {dj} is a difference sequence satisfying (4)-(5) and i= [§J + 1, • • • , n + 
L5J-r. 

Let K(-) be a real-valued function such that K(u) > and is not identi- 
cally zero; K[u) is bounded [3M > such that K(u) < M for Vu]; i^(u) is 
supported on [—1, 1] and / K(u) du = l. We use the notation a\ = j u 2 K{u) du 
and Rk = J K 2 {u)du. Then, based on A ri j, we define a variance estimator 
Vh{x) of order r as the local polynomial regression estimator based on A^, 

(8) V h (x) = a , 

where 

(ao,ai, . . . ,d p ) 

n+|r/2j-r 

= argmin ^ [A^ - a - oi (x - cc*) a p (x - Xi) p } 2 

a , ai ,...,a p i=[r/2l+1 




The value /i in (8) is called the bandwidth and K is the weight function. 

It should be clear that these estimators are unbiased under the assumption 
of homoscedasticity V(x) = a 2 and constant mean g(x) = fi. We begin with 
the definition of the functional class that will be used in the asymptotic 
results to follow. 



Definition 2.2. Define the functional class C 7 as follows. Let C\ > 0, 
C2 > 0. Let us denote 7' = 7 — [7] where [tJ denotes the greatest integer 
less than 7. We say that the function f(x) belongs to the class C 7 if for all 
x, ye (0,1) 

(9) \f M ^)-f M (y)\<Ci\x-y\^', 

(10) \f {k) (x)\<C 2 , 

for k = 0, . . . , [7] — 1. Note that C 7 depends on the choice of C\, C%, but for 
our convenience we omit this dependence from the notation. There are also 
similar types of dependence in the definitions that immediately follow. 

Definition 2.3. Let 5 > 0. We say the function is in class if it is in 
C 7 and in addition 

(11) f(x)>5. 

These classes of functions are familiar in the literature, as in Fan [15, 16] 
and are often referred to as Lipschitz balls. 



SEQUENCE-BASED VARIANCE ESTIMATION 



5 



Definition 2.4. Define the pointwise risk of the variance estimator 
Vh(x) (its mean squared error at a point x) as 

R(V(x), V h (x)) = E[V h (x) - V(x)] 2 . 

Definition 2.5. Define the globai mean squared risk of the variance 
estimator Vh(x) as 

(12) R(V,V h ) = E^j\v h (x)-V(x)) 2 dxy 

Then the globally optimal in the minimax sense bandwidth h op t is defined 
as 

h n = argmin{sup{i?(V, V h ):Ve C 7 , g £ Cp} : h > 0}. 

Note that h n depends on n as well as C\, C2, [3 and 7. A similar definition 
applies in the setting of Definition 2.4. 

Remark 2.6. In the special case where 7 = 2 and f3 = 1, the finite sam- 
ple performance of this estimator has been investigated in Levine [25] to- 
gether with the possible choice of bandwidth. A version of A'-fold cross- 
validation has been recommended as the most suitable method. When uti- 
lized, it produces a variance estimator that in typical cases is not very sen- 
sitive to the choice of the mean function g(x). 

Theorem 2.7. Consider the nonparametric regression problem described 
by (1), with estimator as described in (8). Fix C\, C2, 7 > and (3 > 
7/(47 + 2) to define functional classes C 7 and Cp according to the defini- 
tion (2.2). Assume p> [7] • Then the optimal bandwidth is h n x n _1 ^ 27+1 ^ . 
Let < a < a < 00. Then there are constants B_ and B such that 

^ n -2 7 /(2 7+ l) +o(n -2 7 /(2 7+ l) ) 

(13) 

< R(V, V) < s n -2 7 /(2 7 +i) + o(n -2 7 /(2 7 +i) ) 
for all h satisfying a < n l ^ 2 ' 1+l ^h < a, uniformly for g £Cp, V 6 C~. 

Theorem 2.7 refers to properties of the integrated mean square error. 
Related results also hold for minimax risk at a point. The main results are 
stated in the following theorem. 

Theorem 2.8. Consider the setting of Theorem 2.7. Let x G (0,1). 
Assume p> Then the optimal bandwidth is h n (x) xn _1 ^ 27+1 '. Let 
< a < a < 00 . Then there are constants B and B such that 

£ n -2 7 /(2 7+ l) + o(n -2 7 /(2 7+ l) ) < R( y( XQ ) V hn ( X0 )) 

(14) 

<5 n -27/(27+l) +0 ( n -2 7 /(27+l) ) 
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for all h(x) satisfying a < n 1 ^ 2 ^^ 1 ^ < a, uniformly for g G Cp, V € C, 



The proof of these theorems can be found in the Appendix. The minimax 
rates obtained in (13) and (14) will be shown in Theorems 4.1 and 4.2 to be 
optimal in the setting of Theorem 2.7. At this point, the following remarks 
may be helpful. 

Remark 2.9. If one assumes that j3 = 7/(47 + 2) in the definition of the 
functional class Cp, the conclusions of Theorems 2.7 and 2.8 remain valid, 
but the constants B_ and B appearing in them become dependent on (3. 
If (3 < 7/(47 + 2), the conclusion (14) does not hold. For more details, see 
comments preceding Theorem 4.2 and the Appendix. 

Remark 2.10. Midler and Stadtmiiller [28] considered the general 
quadratic form based estimator similar to our (8) and derived convergence 
rates for its mean squared error. They also were the first to point out an 
error in the paper by Hall and Carroll [20] (see Miiller and Stadtmiiller [28], 
pages 214 and 221). They use a slightly different (more restrictive) definition 
of the classes C 7 and Cp and only establish rates of convergence and error 
terms on those rates for fixed functions V and g within the classes C 7 and 
Cp. Our results resemble these but we also establish the rates of convergence 
uniformly over the functional classes Cp and C 7 and therefore our bounds 
are of the minimax type. 

Remark 2.11. It is important to notice that the asymptotic mean 
squared risks in Theorems 2.7 and 2.8 can be further reduced by proper 
choice of the difference sequence {dj}. The proof in the Appendix supple- 
mented with material in Hall, Kay and Titterington [21] shows that the 
asymptotic variance of our estimators will be affected by the choice of the 
difference sequence, but the choice of this sequence does not affect the bias in 
asymptotic calculations. The effect on the asymptotic variance is to multiply 
it by a constant proportional to 



For any given value of r there is a difference sequence that minimizes this 
constant. A computational algorithm for these sequences is given in Hall, 
Kay and Titterington [21]. The resulting minimal constant as a function of 
r is C min = (2r + l)/r. 



(15) 




SEQUENCE-BASED VARIANCE ESTIMATION 



7 



3. Asymptotic normality. As a next step, we establish that the estimator 
(8) is asymptotically normal. We recall that the local polynomial regression 
estimator Vh(x) can be represented as 

n+[r/2]-r 

(16) V h {x)= K n;h*{xi)A?,i, 

i=|r/2j+l 

where K n .^ x (xi) = K njX (^j^ L ) . Here i^n,x( £ ^ £i ) can be thought of as a cen- 
tered and rescaled nonnegative local kernel function whose shape depends 
on the location of design points Xi, the point of estimation x and the num- 
ber of observations n. We know that K ritX ( x ~^ i ) satisfies discrete moment 
conditions, 



n+[r/2\-r 

(17) £ K ^ 

i=|r/2j+l 
n+\r/2\-r 

(18) ( X - X if K n,x 
i=|r/2j+l 



h 



for any q = l,...,p. We also need the fact that the support of K n {-) is 
contained in that of K(-); in other words, K n (-) = whenever \xi — x\ > h. 
For more details see, for example, Fan and Gijbels [17]. Now we can state 
the following result. 

Theorem 3.1. Consider the nonparametric regression problem described 
by (1), with estimator as described in (8). We assume that the functions g(x) 
and V(x) are continuous for any x £ [0, 1] and V is bounded away from zero. 
Assume /i4+j, = E(si) 4+ ' / < oo for some v > 0. Then, as h — ► 0, n — > oo and 
nh — > oo, we find that 

(19) Vn~h{V h (x)-V{x)-0(h 2 ^)) 

is asymptotically normal with mean zero and variance a 1 where < a 2 < oo. 

Proof. To prove this result, we rely on the CLT for partial sums of a 
generalized linear process 



n 

(20) X n = Y d an&, 



i=i 



where £j is a mixing sequence. This and several similar results were es- 
tablished in Peligrad and Utev [32]. Thus, the estimator (8) can be easily 
represented in the form (20) with K n .^^ x {xi) as a n j. What remains is to verify 
the conditions of Theorem 2.2(c) in Peligrad and Utev [32]. 
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• The first condition is 

(21) max \a n i\ — ► 

l<i<n 

as n — > oo and it is immediately satisfied since 

(22) K n . Ax (x i ) = 0((nh)- 1 ) 

uniformly for all x S [0, 1]. 

• The second condition is 

(23) sup^a^. <OG> 

n i=i 

It can be verified by using the Cauchy-Schwarz inequality and (22). 

• To establish uniform integrability of = A^, we use a simple criterion 
mentioned in Shiryaev [37] that requires existence of the nonnegative, 
monotonically increasing function G(t), defined for t > 0, such that 

r G(t) 

hm = oo 

t— >oo t 

and 

sup£[G(A*J] < oo. 

i 

It is enough to choose G(t) = t u for small v > to have this condition 
satisfied. Finally, the remaining three conditions of Peligrad and Utev 
[32] are trivially satisfied. □ 



4. Asymptotic minimaxity and related issues. Lower bounds on the asymp- 
totic minimax rate for estimating a nonparametric variance in formulations 
related to that in (1) have occasionally been studied in earlier literature. 
Two papers seem particularly relevant. Munk and Ruymgaart [30] study a 
different, but related problem. Their paper contains a lower bound on the 
asymptotic minimax risk for their setting. In particular, their setting in- 
volves a problem with random design, rather than the fixed design case in 
(1). Their proof uses the Van Trees inequality and relies heavily on the fact 
that their (Xi,Yi) pairs are independent and identically distributed. While 
it may well be possible to do so, it is not immediately evident how to modify 
their argument to apply to the setting (1). 

Hall and Carroll [20] consider a setting similar to ours. Their equation 
(2.13) claims (in our notation) that there is a constant K > 0, possibly 
depending on Ci, C2, (3 such that for any estimator V 

sup{R{V(x ),V{x )) :VeC^,ge Cp} 
> K max{n- 2 ^ +1 ) , rT 4 /W+i) } 
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Note that n -^<l^+ 1 ) = (n~ 4/3 /( 2/m )) for [3 < 7/(27 + 2). It thus follows 
from (14) in our Theorem 2.8 that for any 7/ (47 + 2) < j3 < 7/(27 + 2) and 
n sufficiently large 

sup{R(V(x ),V hn (x )) :V &C^,g & Cp} 

(25) « Km a x{n^ 2 '' +1 \n- 4 ^ 2 ^}, 

where h n is yet again the optimal bandwidth. This contradicts the assertion 
in Hall and Carroll [20], and shows that their assertion (2.13) is in error — 
as is the argument supporting it that follows (C.3) of their article. For a 
similar commentary see also Miiller and Stadtmuller [28]. Because of this 
contradiction it is necessary to give an independent statement and proof of 
a lower bound for the minimax risk. That is the goal of this section, where 
we treat the case in which /3> 7/(47 + 2). The minimax lower bound for 
the case in which (3 < 7/(47 + 2) requires different methods which are more 
sophisticated. That well as some further generalizations, have been 

treated in Wang, Brown, Cai and Levine [42] as a sequel to the present 
paper. That paper proves ratewise sharp lower and upper bounds for the 
case where j3 < 7/ (47 + 2). 

We have treated both mean squared error at a point (in Theorem 2.8) 
and integrated mean squared error (in Theorem 2.7). Correspondingly, we 
provide statements of lower bounds on the minimax rate for each of these 
cases. The local version of the lower bound result for the minimax risk is 
obtained under the assumption of normality of errors £j . See Section 2 for the 
definition of R and other quantities that appear in the following statements. 

Theorem 4.1. Consider the nonparametric regression problem described 
by (1). Fix C\, C2, j3 and 7 to define functional classes C 7 , Cp according 
to (2.2). Also assume that Ei ~ -/V(0, 1) and independent. Then there is a 
constant K > such that 

(26) mf{sup{R(V, V) : V G C+,g G Cp} : V} > Kn' 2 ^^ 2 ^ 

where the inf is taken over all possible estimators of the variance function V . 

Our argument relies on the so-called "two-point" argument, introduced 
and extensively analyzed in Donoho and Liu [11, 12]. 

Theorem 4.2. Consider the nonparametric regression problem described 
by (1). Fix C\, C2, f3 and 7 to define functional classes C 7 , Cp according 
to (2.2). Also assume that £j ~ N(0, 1) and independent. Then there is a 
constant K > such that 

(27) inf{sup{fl(F(x )), V(x )) : V £ Cy,g G Cp} : V} > Kn^'^^ 
where the inf is taken over all possible estimators of the variance function V . 
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Proof. It is easier to begin with the proof of Theorem 4.2 and then 
proceed to the proof of Theorem 4.1. We will use a two-point modulus-of- 
continuity argument to establish the lower bound. Such an argument was 
pioneered by Donoho and Liu [11, 12] for a different though related problem. 
See also Hall and Carroll [20] and Fan [16]. 

We assume without loss of generality that <? = 0. Define the function 

f2-|t| 7 , if0<|i|<l, 

(28) h(t) = \ ( 2 -|*l) 7 > if K|i| <2, 

(o, if|t|>2. 

Assume (for convenience only) that C\ > 2. Let d be a constant satisfying 
< d < C 2 and let 

(29) kl{x)=d+l6h ^_^y 

Then / e ±i £ C 7 for 5 > sufficiently small. Let H denote the Hellinger 
distance between densities, that is, for any two probability densities mi, m 2 
dominated by a measure fi(dz), 

(30) H 2 {mi,m 2 ) = J {\JmJ~z) - yj ' m 2 (z) ) 2 fJ.(dz). 

Here are two basic facts about this metric that will be used below. If Z = 
{Zj :j — 1, . . . ,n} where the Zj are independent with densities {rrikj - j = 
1, . . . ,n}, k = 1, 2 and = Ujirikj denotes the product density, then 

(31) H 2 (m 1 ,m 2 ) < ^i3" 2 (mij,m 2j ); 



and if m; are univariate normal densities with mean and variance a~, 
i = 1, 2, then 

fa 2 \ 2 

(32) F 2 (mi,m 2 )<2^-|-lj . 

For more details see Brown and Low [3] and Brown et al. [1]. 

It follows that if m^, k = 1,2, are the joint densities of the observations 
{xi,Yi, i = l,...,n} of (1) with g = and f k = /<$,(_!)* then 



(33) 

5 l h 
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For this setting the Hellinger modulus-of-continuity, cj(-) (Donoho and Liu 
[12], equation (1.1)), is defined as the inverse function corresponding to the 
value H (mi, m-z). Hence it satisfies 

(34) o;- 1 (7)=0(n 1 /V 27+1)/27 )- 

Equation (27) then follows, as established in Donoho and Liu [12]. Although 
this completes the proof of Theorem 4.2, we also provide a sketch of the 
argument based on (34). See Donoho and Liu [12] and references cited therein 
for more details. □ 

PROOF of Theorem 4.1. We omit this proof for the sake of brevity. 
It begins from the result in Theorem 4.2 and then follows along the lines 
first described in detail in Donoho, Liu and MacGibbon [13]. This theorem 
can be also viewed as a consequence of the general results on the global 
convergence of nonparametric estimators by Stone [39] and Efromovich [14] 
that do not require normality of errors £j. □ 

APPENDIX 



Proofs of Theorems 2.7 and 2.8. Fix r and functional classes C 7 
and Cr. For the sake of brevity, we write Aj = A r j. Our main tools in this 
proof are the representation (16) of the variance estimator Vh(x) and the 
properties (17)-(18). We also use the property 

n+\r/2\-r 

(35) Y, ( K n;hA^)) 2 = 0[-r 

t=[r/2|-l V 

(35) follows from (22) and the Cauchy-Schwarz inequality. Here and later, 
O is uniform for all V £ C 7 , g £ Cp and {h} = {h n }. Now, 

(36) J E(A l 2 )=Var(A l ) + ( J E(A. J )) 2 , 
where 

(37) Var(A,) = ]T d) Var(y i+i _ Lr/2j ) = V(x t ) + o(Q^ 
and 

(38) E(A l ) = 0^y 

since = 0, Y^dj = 1 and x i+r _^ r / 2 ] — Xi-[ r /2] = O(^). This provides an 
asymptotic bound on the bias as 

n+[r/2]-r 



BiasV h (x)= J2 (V(x l )-V(x))K n]h)X (x l ) + 0(n-^)+0( 

i=|r/2j+l 

(39) 

= 0{h r )+0{n- r )+0{n-P). 
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The last step in (39) is a very minor variation of the technique employed in 
Wang, Brown, Cai and Levine [42] (see pages 10-11). 

Next, we need to use the fact that Aj and A,- are independent if \i — j\ > 
r + 1. Hence, 

(n+[r/2\-r \ 
Y K n . ih!X (xi)A 2 A 
i=[r/2\+l ) 

n+[r/2\-r i+ r 
= ^ E i^n;/ l ,x(Xi)Kn ;M (^)Cov(Af,A]) 
i=|_r/2j+l j=i-r 

n+\r/2\-r j+ r 

^ E E 4 ~ l (( K n;h,x(xi)) 2 + (K n . hjX (Xj)) 2 ) 
i=|_r/2j+l j=i-r 

x (Var A 2 + Var A 2 ) 

It is easy to see that 

A ? = \ 52 d iyj+i-[r/2\ ) 

\j=0 / 



J2 d iV V ( X j+i-lr/2}) £ i+j-[r/2} + 0(n 



\j=0 / 
and this means, in turn, that 

Var A 2 < C 2 Var ^e i+i _ Lr/2 j + O^"' 3 )) 
\j=0 / 

< Cf (r + l)/x 4 + 0{rT 2(i ) + 0(n" 4/3 ) = 0(1). 

Hence, 

n+[r/2\~r i+ r 

YsxV h {x)<0{l) i(K n . h>x ( Xi )) 2 + (K n . hjX ( Xj )) 2 ) 

i=|r/2j+l j=i-r 

(40) 

= r 



nh 

Combining the bounds in (39) and (40) yields the assertion of the theorem 
since 2/3 > 7/(27 + 1). □ 

Acknowledgments. We wish to thank T. Cai and L. Wang for pointing 
out the significance of the article of Hall and Carroll [20] and its relation to 
our (14). 
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